CD8 TIL projection uncover cells types hidden by transient gene programs

Cell-Cycle, interferon and Tissue-residency gene programs

Background

Cells responding to external cues might overexpress transient gene programs. In scRNA-seq data, clusters will be driven by the strongest variation component, which can be sometimes transient gene programs. These programs such as the cell cycle, might hide the original, potentially more stable cell type beneath.

These are among the main types of program that classically are being upregulated in scRNA-seq data:

  • Cell cycle, when cells go through division (eg. MCM5, TOP2A)

  • IFN response, in response to IFN stimulation (eg. MX1, ISG15)

  • HSP response, in response to stress (eg. HSPA1, HSPA2)

  • Activation response, in response to activation signal, eg TCR engagement (eg. JUN, FOS)

  • Tissue-residency program, when cell infiltrate and home within tissues (eg. ZNF683, ITGAE)

Note: while some programs are clearly transient (like the cell cycle), some programs might be more permanent / epigenetically fixed. The distinction between transient and permanent might be harder for some programs, like tissue-residency. We argue here than having a well-curated reference, free from many clearly transient gene programs, will help in distinguishing transient from more stable cell states.

Here we will see how we use reference-based projection to uncover the original cell types from IFN-activated, Cycling and Tissue-resident cells using data from Gueguen et al. 2021.

How to identify gene programs?

Coregulated gene sets can also be found using unsupervised techniques. Gene modules can be found using PCA, ICA, cNMF, scCoGAPS, or find_gene_modules from Monocle3, which runs UMAP on the genes and then groups them into modules using Louvain clustering. Here, for simplicity, we chose to focus on easily explainable, explicit programs.

Data setup

Loading the CD8 reference

First let’s load the CD8 TIL reference by downloading it from Figshare.

# Load the reference
options(timeout = max(900, getOption("timeout")))
#download.file("https://figshare.com/ndownloader/files/38921366", destfile = "CD8T_human_ref_v1.rds")
ref.cd8 <- load.reference.map("CD8T_human_ref_v1.rds")
## [1] "Loading Custom Reference Atlas..."
## [1] "Loaded Custom Reference map Human CD8 TILs"
# Setup colors
mycols <- ref.cd8@misc$atlas.palette

# Plotting
DimPlot(ref.cd8,  group.by = 'functional.cluster', label = T, repel = T, cols = mycols) + theme(aspect.ratio = 1)

We see that the reference is composed only of what should only be stable cell types. Indeed, cells expressing highly transient gene programs (Cycling and IFN) were removed when constructing the reference.

Load Gueguen et al. data

#download.file("https://figshare.com/ndownloader/files/39082049", destfile = "gueguen.cd3.Rds")
gueguen.cd3 <- readRDS("gueguen.cd3.Rds")
gueguen.cd3$seurat_clusters <- Idents(gueguen.cd3)

# DimPlot
DimPlot(gueguen.cd3, order = T,  label = T, repel = T) 

Projection using reference map

The first step is to project using ProjecTILs ProjecTILs.classifier function. This mode will give us the labels, while keeping the original UMAP embeddings.

# Projection
DefaultAssay(gueguen.cd3) <- "RNA"
gueguen.cd3 <- ProjecTILs.classifier(gueguen.cd3, ref = ref.cd8, filter.cells = T, split.by = 'orig.ident', ncores = 6)
table(gueguen.cd3$functional.cluster)
## 
##        CD8.CM        CD8.EM      CD8.MAIT CD8.NaiveLike     CD8.TEMRA 
##          2132          3181           147           238           276 
##       CD8.TEX      CD8.TPEX 
##          3340           337
# Plotting
DimPlot(gueguen.cd3, group.by = 'functional.cluster', order = T, cols = mycols, label = T, repel = T)

Augmenting cell type classifications with gene program scores

To further augment granularity, here we divide the dataset by computing signatures scores using UCell. To this end, we use our collection of useful transcriptional gene signatures: SignatuR, as well as manually curated signatures.

Here, we will use signatures generated for:

  • G1S cell cycle

  • G2M cell cycle

  • IFN response

  • Tissue residency

library(SignatuR)
signatures <- GetSignature(SignatuR$Hs)
signatures <- signatures[c("cellCycle.G1S","cellCycle.G2M")]

# Using manual signatures for IFN and resident, which work better with UCell than the full signatures from SignatuR package
signatures[["IFN"]] <- c("ISG15","IFI6","IFI44L","MX1")
signatures[["Tissue_Resident"]] <-  c("ITGAE")

Now let’s add the scores using UCell.

gueguen.cd3 <- AddModuleScore_UCell(gueguen.cd3, features = signatures, ncores = 8,
                       assay = "RNA")
Should I use unsupervised clusters or UCell scores to measure gene programs?

It all depends on the dataset that you have. If you have annotated your dataset with ProjecTILs, you won’t have any cluster driven by transient programs. In this case, you can use UCell signature scoring. Else, if you have clusters clearly driven by transient gene programs, you can use ProjecTILs mapping to unravel the underlying cell types within these clusters of interest.

FeaturePlot(gueguen.cd3, features = paste0(names(signatures), "_UCell"),
    order = T, pt.size = 0.3) & scale_color_paletteer_c("pals::coolwarm") 
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

We see that these signatures correlate with their annotation of the original paper.

Now, let’s check signatures scores distribution to decide which threshold put to distinguish negative/positive cells.

Signatures scores distributions
qplot(gueguen.cd3$cellCycle.G1S_UCell, main = "G1S-Cycling") +
    qplot(gueguen.cd3$cellCycle.G2M_UCell, main = "G2M-Cycling") + qplot(gueguen.cd3$IFN_UCell, main = "IFN") + qplot(gueguen.cd3$Tissue_Resident_UCell, main = "Tissue Resident") & theme_bw()

# subset resident high cells in the CD8 reference
resident.high <- subset(gueguen.cd3, subset = Tissue_Resident_UCell > 0.2)
resident.low <- subset(gueguen.cd3, subset = Tissue_Resident_UCell <= 0.2)

# Set list
resident.list <- list(resident.low,resident.high)
names(resident.list) <- c("resident.low","resident.high")
# Radars to check differences between resident / non resident
plot.states.radar(ref = ref.cd8, query = resident.list, labels.col = 'functional.cluster', min.cells = 5, genes4radar = c("TCF7","CCR7","KLF2","GZMK","LMNA","FCGR3A","FGFBP2","XCL1","KLRB1", 'ZNF683',"ITGAE", 'ITGA1')) 

Resident-high and resident-low cells display the same overall phenotypes, but differ in their expression of key resident genes, upregulated when cells home whithin tissue, such as integrins (ITGAE, ITGA1) or the transcription factor HOBIT (ZNF683).

Indeed, the radars confirm that the annotation seems of good quality. Radar are good way to show that phenotypes match between the reference and the query, but they differ by the resident genes (ZNF683, ITGA1, ITGAE and CXCR6). We also confirm that the cluster CD8-XCL1, originally identified as an early-precursors, indeed match well the phenotype of Central Memory cells, while still being high for tissue-residency signal.

By looking at the UCell scores distribution, we categorised the cells by putting a UCell score threshold of 0.2.

wrap_plots(pll, guides = "collect")

As previously shown, CD8.TEX and CD8.TPEX are the most cycling, resident and IFN-responding subsets. For the IFN signal, results are inconsistent between UCell signature scores and projected clustes. Results will vary depending on the signatures chosen, the UCell threshold chosen, as well as the clustering resolution.

Upset plot to check programs cooccurences

Let’s focus on the CD8.TEX population, and see how the different gene programs relate using a UpSet plot.

gueguen.cd3.TEX <- subset(gueguen.cd3, subset = functional.cluster == "CD8.TEX")
listInput <- list(IFN = which(gueguen.cd3.TEX$IFN.class == "Positive"), 
                  Cycling.G2M = which(gueguen.cd3.TEX$cellCycle.G2M.class == "Positive"), 
                  Cycling.G1S = which(gueguen.cd3.TEX$cellCycle.G1S.class == "Positive"),
                  Tissue_resident = which(gueguen.cd3.TEX$Tissue_Resident.class == "Positive"))


upset(fromList(listInput), order.by = "freq") 
grid.text("CD8.TEX gene programs overlap",x = 0.65, y=0.95, gp=gpar(fontsize=12))

We can see that in CD8.TEX, broadly half of the IFN high cells are also high for the tissue-residency program. This should be explored further.

Comparison with the original annotation

In the original study, the authors use Seurat TransferData function to tranfer labels from the non-cycling clusters to the cycling clusters (‘CD4/8-TOP2A’, ‘CD4/8-MCM5’). With label tranfer, Cycling cells are defined as CD8.LAYN and CD8.GZMH.

Now let’s compare this result with our projection results. First, let’s select the 2 cycling clusters.

gueguen.cycling <- subset(gueguen.cd3, subset = seurat_clusters %in% c('CD4/8-TOP2A', "CD4/8-MCM5"))

# Dimplot
DimPlot(gueguen.cycling, group.by = 'functional.cluster', label = T, repel = T, cols = mycols)
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Clusters proportions in cycling

plot.statepred.composition(ref.cd8, gueguen.cycling, metric = "Percent") +
    ggtitle("Cycling cells") + ylim(0, 75) + theme_bw() + RotatedAxis()

Let’s check radars plots to confirm identities.

plot.states.radar(ref = ref.cd8, query = gueguen.cycling, labels.col = 'functional.cluster', min.cells = 5, genes4radar = c('IL7R','LEF1', "TCF7","CCR7","KLF2","GZMK","FCGR3A","FGFBP2","XCL1","KLRB1",'TOP2A','MCM5','MCM3'))

Radar plots look good. It seems that we have indeed less differentiated cells in the cycling compartment (Central Memory and effector memory). Projection seems more robust than Seurat label transfer to uncover cell types hidden behind transient gene programs.

Session Info
sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices datasets  utils     methods  
## [8] base     
## 
## other attached packages:
##  [1] SignatuR_0.1.0      UpSetR_1.4.0        cowplot_1.1.1      
##  [4] plotly_4.10.1       pheatmap_1.0.12     UCell_2.2.0        
##  [7] paletteer_1.5.0     scales_1.2.1        ggrepel_0.9.2      
## [10] gridExtra_2.3       ProjecTILs_3.0.2    patchwork_1.1.2    
## [13] GEOquery_2.66.0     Biobase_2.58.0      BiocGenerics_0.44.0
## [16] data.table_1.14.6   STACAS_2.0.0        scGate_1.4.1       
## [19] forcats_0.5.2       stringr_1.5.0       dplyr_1.0.10       
## [22] purrr_1.0.1         readr_2.1.3         tidyr_1.2.1        
## [25] tibble_3.1.8        ggplot2_3.4.0       tidyverse_1.3.2    
## [28] SeuratObject_4.1.3  Seurat_4.3.0       
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                  spatstat.explore_3.0-5     
##   [3] reticulate_1.27             tidyselect_1.2.0           
##   [5] htmlwidgets_1.6.1           BiocParallel_1.32.5        
##   [7] Rtsne_0.16                  munsell_0.5.0              
##   [9] codetools_0.2-18            ica_1.0-3                  
##  [11] umap_0.2.9.0                future_1.30.0              
##  [13] miniUI_0.1.1.1              withr_2.5.0                
##  [15] spatstat.random_3.1-2       colorspace_2.1-0           
##  [17] progressr_0.13.0            highr_0.10                 
##  [19] knitr_1.41                  rstudioapi_0.14            
##  [21] stats4_4.2.1                SingleCellExperiment_1.20.0
##  [23] ROCR_1.0-11                 tensor_1.5                 
##  [25] listenv_0.9.0               labeling_0.4.2             
##  [27] MatrixGenerics_1.10.0       GenomeInfoDbData_1.2.9     
##  [29] polyclip_1.10-4             farver_2.1.1               
##  [31] parallelly_1.34.0           vctrs_0.5.2                
##  [33] generics_0.1.3              xfun_0.36                  
##  [35] timechange_0.2.0            R6_2.5.1                   
##  [37] GenomeInfoDb_1.34.7         rmdformats_1.0.4           
##  [39] pals_1.7                    bitops_1.0-7               
##  [41] spatstat.utils_3.0-1        cachem_1.0.6               
##  [43] DelayedArray_0.24.0         assertthat_0.2.1           
##  [45] promises_1.2.0.1            googlesheets4_1.0.1        
##  [47] gtable_0.3.1                globals_0.16.2             
##  [49] goftest_1.2-3               rlang_1.0.6                
##  [51] splines_4.2.1               lazyeval_0.2.2             
##  [53] gargle_1.2.1                dichromat_2.0-0.1          
##  [55] prismatic_1.1.1             spatstat.geom_3.0-5        
##  [57] broom_1.0.2                 BiocManager_1.30.19        
##  [59] yaml_2.3.7                  reshape2_1.4.4             
##  [61] abind_1.4-5                 modelr_0.1.10              
##  [63] backports_1.4.1             httpuv_1.6.8               
##  [65] tools_4.2.1                 bookdown_0.32              
##  [67] ellipsis_0.3.2              jquerylib_0.1.4            
##  [69] RColorBrewer_1.1-3          ggridges_0.5.4             
##  [71] Rcpp_1.0.10                 plyr_1.8.8                 
##  [73] zlibbioc_1.44.0             RCurl_1.98-1.9             
##  [75] openssl_2.0.5               deldir_1.0-6               
##  [77] pbapply_1.7-0               S4Vectors_0.36.1           
##  [79] zoo_1.8-11                  SummarizedExperiment_1.28.0
##  [81] haven_2.5.1                 cluster_2.1.4              
##  [83] fs_1.6.0                    magrittr_2.0.3             
##  [85] RSpectra_0.16-1             scattermore_0.8            
##  [87] lmtest_0.9-40               reprex_2.0.2               
##  [89] RANN_2.6.1                  googledrive_2.0.0          
##  [91] fitdistrplus_1.1-8          matrixStats_0.63.0         
##  [93] hms_1.1.2                   mime_0.12                  
##  [95] evaluate_0.20               xtable_1.8-4               
##  [97] readxl_1.4.1                IRanges_2.32.0             
##  [99] compiler_4.2.1              maps_3.4.1                 
## [101] KernSmooth_2.23-20          crayon_1.5.2               
## [103] htmltools_0.5.4             later_1.3.0                
## [105] tzdb_0.3.0                  lubridate_1.9.0            
## [107] DBI_1.1.3                   dbplyr_2.3.0               
## [109] MASS_7.3-58.2               data.tree_1.0.0            
## [111] Matrix_1.5-3                cli_3.6.0                  
## [113] parallel_4.2.1              igraph_1.3.5               
## [115] GenomicRanges_1.50.2        pkgconfig_2.0.3            
## [117] sp_1.6-0                    spatstat.sparse_3.0-0      
## [119] xml2_1.3.3                  bslib_0.4.2                
## [121] XVector_0.38.0              rvest_1.0.3                
## [123] digest_0.6.31               pracma_2.4.2               
## [125] sctransform_0.3.5           RcppAnnoy_0.0.20           
## [127] spatstat.data_3.0-0         rmarkdown_2.20             
## [129] cellranger_1.1.0            leiden_0.4.3               
## [131] uwot_0.1.14                 shiny_1.7.4                
## [133] lifecycle_1.0.3             nlme_3.1-161               
## [135] jsonlite_1.8.4              BiocNeighbors_1.16.0       
## [137] mapproj_1.2.9               askpass_1.1                
## [139] viridisLite_0.4.1           limma_3.54.0               
## [141] fansi_1.0.4                 pillar_1.8.1               
## [143] lattice_0.20-45             fastmap_1.1.0              
## [145] httr_1.4.4                  survival_3.5-0             
## [147] glue_1.6.2                  png_0.1-8                  
## [149] stringi_1.7.12              sass_0.4.5                 
## [151] rematch2_2.1.2              renv_0.15.5                
## [153] irlba_2.3.5.1               future.apply_1.10.0